Credit assignment problem of neural networks refers to evaluating the credit of each network component to the final outputs. For an untrained neural network, approaches to tackling it have made great contributions to parameter update and model revolution during the training phase. This problem on trained neural networks receives rare attention, nevertheless, it plays an increasingly important role in neural network patch, specification and verification. Based on Koopman operator theory, this paper presents an alternative perspective of linear dynamics on dealing with the credit assignment problem for trained neural networks. Regarding a neural network as the composition of sub-dynamics series, we utilize step-delay embedding to capture snapshots of each component, characterizing the established mapping as exactly as possible. To circumvent the dimension-difference problem encountered during the embedding, a composition and decomposition of an auxiliary linear layer, termed minimal linear dimension alignment, is carefully designed with rigorous formal guarantee. Afterwards, each component is approximated by a Koopman operator and we derive the Jacobian matrix and its corresponding determinant, similar to backward propagation. Then, we can define a metric with algebraic interpretability for the credit assignment of each network component. Moreover, experiments conducted on typical neural networks demonstrate the effectiveness of the proposed method.
translated by 谷歌翻译
Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing. We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes given a speech example and its transcription. By learning to reconstruct the masked parts of the input in different languages, our model shows great improvements over speaker-embedding-based multi-speaker TTS methods. Moreover, our framework is end-to-end for both the training and the inference without any finetuning effort. In cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing tasks, our experiments show that our model outperforms speaker-embedding-based multi-speaker TTS methods. The code and model are publicly available at PaddleSpeech.
translated by 谷歌翻译
设备的端到端(E2E)模型已显示出对质量和延迟的英语语音搜索任务的常规模型的改进。 E2E模型还显示了多语言自动语音识别(ASR)的有希望的结果。在本文中,我们将以前的容量解决方案扩展到流应用程序,并提出流媒体多语言E2E ASR系统,该系统在设备上完全运行,质量和延迟与单个单语言模型相当。为了实现这一目标,我们提出了一个编码器端量模型和一个终端(EOU)联合层,以提高质量和延迟权衡。我们的系统以语言不可知论的方式构建,允许它实时支持本条件的代码切换。为了解决大型模型的可行性问题,我们进行了设备分析,并用最近开发的嵌入解码器代替了耗时的LSTM解码器。通过这些更改,我们设法在不到实时的时间内在移动设备上运行了这样的系统。
translated by 谷歌翻译
文本匹配是信息检索和自然语言处理的基本技术。文本匹配任务共享确定两个给定文本之间关系的相同范例。这些关系因任务而异,例如〜在文档检索中相关性,释义识别中的语义一致性和所回答的可回答判断。但是,文本匹配的基本信号保留在有限范围中,即〜精确匹配,语义匹配和推理匹配。理想情况下,良好的文本匹配模型可以学会捕获和汇总这些信号,以实现不同的匹配任务以实现竞争性能,而最近的最新文本匹配模型,例如〜预训练的语言模型(PLM)很难概括。这是因为在特定于任务的数据集上的端到端监督学习使模型过分强调了数据示例偏置和特定于任务的信号,而不是基本的匹配信号。为了克服这个问题,我们采用了专业化的将军培训策略,并将其称为比赛推出。在专业阶段,对不同匹配任务的描述映射到一些提示令牌。在概括阶段,匹配模型通过接受各种匹配任务的培训来探索基本匹配信号。高不同的匹配任务避免了模型拟合特定任务的数据偏差,因此该模型可以专注于学习基本匹配信号。同时,在第一步中获得的提示令牌有助于模型区分不同的特定任务匹配信号。公共数据集上的实验结果表明,匹配点可以提高PLM在文本匹配中的多任务概括能力,并产生更好的内域多任务,外域多任务和新任务适应性性能由以前的微调范式训练的特定于任务模型。
translated by 谷歌翻译
部署在野外的机器学习系统通常在源分布上培训,但部署在不同的目标分布上。未标记的数据可以是用于缓解这些分布班次的强大的利用点,因为它通常比标记数据更具可用。然而,未标记数据的现有分配转换基准不反映现实世界应用中出现的方案的广度。在这项工作中,我们介绍了Wilds 2.0更新,该更新在分发转移的野外基准中扩展了10个数据集中的8个,以包括将在部署中逼真获得的策划未标记数据。为了保持一致性,标记的培训,验证和测试集以及评估度量与原始野外基准中的标记与评估度量完全相同。这些数据集涵盖了广泛的应用程序(从组织学到野生动物保护),任务(分类,回归和检测)和方式(照片,卫星图像,显微镜载玻片,文本,分子图)。我们系统地基准测试最先进的方法,可以利用未标记的数据,包括域不变,自我培训和自我监督方法,并表明他们在野外的成功2.0是有限的。为了方便方法开发和评估,我们提供了一个自动化数据加载的开源包,并包含本文中使用的所有模型架构和方法。代码和排行榜可在https://wilds.stanford.edu获得。
translated by 谷歌翻译
通过利用仅偏置模型的输出来调整学习目标,可以有效地显示了基于组合的脱叠方法。在本文中,我们专注于这些基于集合的方法的偏见模型,这起到了重要作用,但在现有文献中没有大量关注。从理论上讲,我们证明了脱结性能可能因偏见模型的不准确性估计而受损。凭经验,我们表明现有的偏见模型在产生准确的不确定性估计方面不足。这些发现的动机,我们建议在唯一的模型上进行校准,从而实现基于三阶段的脱叠框架,包括偏置建模,模型校准和脱叠。 NLI的实验结果和事实验证任务表明,我们提出的三阶段脱叠框架始终如一地优于传统的两级,以分配的准确性。
translated by 谷歌翻译
AI正在经历范式转变,随着模型的兴起(例如Bert,Dall-E,GPT-3),这些模型经过大规模的数据训练,并且可以适应广泛的下游任务。我们称这些模型基础模型来强调其至关重要但不完整的特征。该报告提供了基础模型的机会和风险的详尽说明,包括其功能(例如语言,愿景,机器人技术,推理,人类互动)和技术原则(例如,模型架构,培训程序,数据,系统,安全,安全性,评估,理论)对其应用(例如法律,医疗保健,教育)和社会影响(例如不平等,滥用,经济和环境影响,法律和道德考虑)。尽管基础模型基于标准的深度学习和转移学习,但它们的规模导致了新的新兴能力,以及它们在许多任务中的有效性都激发了同质化。同质化提供了强大的杠杆作用,但要求谨慎,因为基础模型的缺陷均由下游的所有适应模型继承。尽管即将广泛地部署基础模型,但我们目前对它们的工作方式,失败以及由于其新兴属性的影响而缺乏清晰的了解。为了解决这些问题,我们认为基础模型的许多批判性研究都需要与他们的基本社会技术性质相称。
translated by 谷歌翻译
Standard training via empirical risk minimization (ERM) can produce models that achieve high accuracy on average but low accuracy on certain groups, especially in the presence of spurious correlations between the input and label. Prior approaches that achieve high worst-group accuracy, like group distributionally robust optimization (group DRO) require expensive group annotations for each training point, whereas approaches that do not use such group annotations typically achieve unsatisfactory worst-group accuracy. In this paper, we propose a simple two-stage approach, JTT, that first trains a standard ERM model for several epochs, and then trains a second model that upweights the training examples that the first model misclassified. Intuitively, this upweights examples from groups on which standard ERM models perform poorly, leading to improved worst-group performance. Averaged over four image classification and natural language processing tasks with spurious correlations, JTT closes 75% of the gap in worst-group accuracy between standard ERM and group DRO, while only requiring group annotations on a small validation set in order to tune hyperparameters.
translated by 谷歌翻译
量化城市道路网络(URNS)不同部分的拓扑相似之处使我们能够了解城市成长模式。虽然传统统计信息提供有关单个节点的直接邻居或整个网络的特性的有用信息,但是这种度量无法衡量考虑本地间接邻域关系的子网的相似性。在这项研究中,我们提出了一种基于图的机器学习方法来量化子网的空间均匀性。我们将该方法应用于全球30个城市的11,790个城市道路网络,以衡量每个城市和不同城市的道路网络的空间均匀性。我们发现,城市内的空间均匀性与诸如GDP和人口增长的社会经济地位高度相关。此外,通过在不同城市转移模型获得的城市间空间均匀性揭示了欧洲的城市网络结构的城市网络结构间相似性,传递给美国和亚洲的城市。可以利用使用我们的方法揭示的社会经济发展和城市间相似性,以了解和转移城市的洞察力。它还使我们能够解决城市政策挑战,包括在迅速城市化地区的网络规划,并打击区域不平等。
translated by 谷歌翻译
Distribution shifts-where the training distribution differs from the test distribution-can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity in the real-world deployments, these distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present Wilds, a curated benchmark of 10 datasets reflecting a diverse range of distribution shifts that naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping. On each dataset, we show that standard training yields substantially lower out-of-distribution than in-distribution performance. This gap remains even with models trained by existing methods for tackling distribution shifts, underscoring the need for new methods for training models that are more robust to the types of distribution shifts that arise in practice. To facilitate method development, we provide an open-source package that automates dataset loading, contains default model architectures and hyperparameters, and standardizes evaluations. Code and leaderboards are available at https://wilds.stanford.edu.
translated by 谷歌翻译